Analyzing the Escherichia coli Gene Expression Data by a Multilayer Adjusted Tree Organizing Map
نویسندگان
چکیده
Using the DNA microarray technology, biologists have thousands of array data available. Discovering the function relations between genes and their involvements in biological processes depends on the ability to efficiently process and quantitatively analyze large amounts of array data. Clustering algorithms are among the popular tools that can be used to help biologists achieve their goals. Although some existing research projects employed clustering algorithms on biological data, none of them has examined the Escherichia coli (E. coli) gene expression data. This paper proposes a clustering algorithm called Multilayer Adjusted Tree Organizing Map (MATOM) to analyze the E. coli gene expression data. In a semi-supervised manner, MATOM constructs a multilayer map, and at the same time, removes noise data in the previously trained maps in order to improve the training process. This paper then presents the clustering results produced by MATOM and other existing clustering algorithms using the E. coli gene expression data, and a new evaluation method to assess them. The results show that MATOM performs the best in terms of percentage of genes that are clustered correctly.
منابع مشابه
Cloning and sequencing of ompf Salmonella typhi Salmonella ompf gene in Escherichia coli Origami
Background and Aim: Salmonella Typhi belongs to the family Enterobacteriaceae, gram-negative bacilli and causes gastrointestinal diseases such as typhoid. This bacterium has a special structure and various genes, including the ompf gene (outer membrane protein). Recent studies have shown the possibility of using ompf in the development of a diagnostic tuberculosis vaccine. Therefore, the aim of...
متن کاملGene Expression Data Mining for Functional Genomics
Methods for supervised and unsupervised clustering and machine learning were studied in order to automatically model relationships between gene expression data and gene functions of the microorganism Escherichia coli. From a pre-selected subset of 265 genes (belonging to 3 functional groups) the function has been predicted with an accuracy higher than 50 % by various data mining methods describ...
متن کاملGene Expression Data Mining for Functional Genomics using Fuzzy Technology
Methods for supervised and unsupervised clustering and machine learning were studied in order to automatically model relationships between gene expression data and gene functions of the microorganism Escherichia coli. From a pre-selected subset of 265 genes (belonging to 3 functional groups) the function has been predicted with an accuracy of 63-71 % by various data mining methods described in ...
متن کاملA COMPARATIVE STUDY BETWEEN EXPRESSION OF A SYNTHETIC GENE OF HUMAN BASIC FIBROBLAST GROWTH FACTOR (hbFGF) AND ITS RELATED cDNA IN ESCHERICHIA COLI
The gene encoding the human basic fibroblast growth factor (hbFGF) has been already chemically-synthesized and cloned in pET-3a expression vector (Pasteur Institute of Iran). In the present study, we compared the level of expression of this synthetic hbFGF and its related cDNA in Escherichia coli. The pBR322-cDNA of hbFGF supplied by Dr. Seno (from Molecular Biology Dept, Okaido prefectural uni...
متن کاملSynthesis and Expression of Modified bFGF Gene in Escherichia coli Cells
A new strategy for construction of synthetic gene encoding human basic fibroblast growth factor comprising DNA annealing-ligation and augmentation by polymerase chain reaction was introduced. The sequence of the gene and corresponding amino acid chain were modified in order to increase stability of the protein. First, 300 bp and 160 bp fragments of the gene were assembled from 18 oligonucleotid...
متن کامل